Deep Imitation Learning for Parameterized Action Spaces

نویسندگان

Matthew Hausknecht

Yilun Chen

Peter Stone

چکیده

Recent results have demonstrated the ability of deep neural networks to serve as effective controllers (or function approximators of the value function) for complex sequential decision-making tasks, including those with raw visual inputs. However, to the best of our knowledge, such demonstrations have been limited to tasks either fully discrete or fully continuous actions. This paper introduces an imitation learning method to train a deep neural network to mimic a stochastic policy in a parameterized action space. The network uses a novel dual classification/regression loss mechanism to decide which discrete action to select as well as the continuous parameters to accompany that action. This method is fully implemented and tested in a subtask of simulated RoboCup soccer. To the best of our knowledge, the resulting networks represent the first demonstration of successful imitation learning in a task with parameterized continuous actions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Reinforcement Learning in Parameterized Action Space

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. However, to the best of our knowledge no previous work has succeeded at using deep neural networks in structured (parameterized) continuous action spaces. To fill this gap, this paper focuses on learning wi...

متن کامل

Ized Action Space

متن کامل

Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

Policy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Modelbased methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task....

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Learning Deep Policies for Physics-Based Manipulation in Clutter

Uncertainty in modeling real world physics makestransferring traditional open-loop motion planning techniquesfrom simulation to the real world particularly challenging.Available closed-loop policy learning approaches, for physics-based manipulation tasks, typically either focus on single objectmanipulation, or rely on imitation learning, which inherentlyconstrains task g...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Deep Imitation Learning for Parameterized Action Spaces

نویسندگان

چکیده

منابع مشابه

Deep Reinforcement Learning in Parameterized Action Space

Ized Action Space

Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

Learning Deep Policies for Physics-Based Manipulation in Clutter

عنوان ژورنال:

اشتراک گذاری